Graph technology applied to editing structured natural-language documents

نویسندگان

  • Felix H. Gatzemeier
  • Oliver Meyer
چکیده

This is a report on our experiences of applying graph technology in the application area of creating and maintaining structured digital documents during the last four years. Our aim is to contribute to the development of tools for authors of documents with an inherent content structure, such as scientific articles or textbooks, that help building and maintaining that structure. One goal of structural integrity is, for example, referring to terms only after they have been defined and then referring to them consistently with the definition. Another goal is for each section and subsection to play a clear and discernable role in the document. The data from the document we need to help achieve these goals are the location and content of definitions, as well as the location of references. More data may be used to offer more functionality, for example the intended style of a presentation (inductive or deductive as prime alternatives), which would have to be consistent with the relationships of terms and the order of their occurrences in the text. We call the general structure of content in the document the content structure. In the aforementioned form, it consists of concepts that may have various relationships among each other, for example one being a part of another, or being required for the understanding of another concept. Elements of the content structure may be connected to the visible parts of the document, for example as being defined or referred to. We call these visible parts the presentation. This corresponds to the level current word processors regard a document: a hierarchy of sections with typographically marked-up text and other media elements. We call these constituent parts the document hierarchy of divisions on the one hand and the media content on the other. Our tools provide means to construct documents containing content structure as well as the presentation and to check them according to rules of readable structuring. They are to provide technology, not policy in the sense that the author should be free to work with or without content-structural support solely at his discretion. Divisions, concepts, definitions, references and concept relationships naturally form a graph. We have therefore used a graph database and a graph-based programming

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Graph Technology for Structured Documents

In the context of integrated software engineering environments, graphs and transformations on them have proven to be a useful technology for describing the structure of and operations on documents in formal languages. We discuss ways of applying this graph technology to more general structured documents. Existing editing environments for general structured documents offer only basic structure-o...

متن کامل

An Optimal Approach to Local and Global Text Coherence Evaluation Combining Entity-based, Graph-based and Entropy-based Approaches

Text coherence evaluation becomes a vital and lovely task in Natural Language Processing subfields, such as text summarization, question answering, text generation and machine translation. Existing methods like entity-based and graph-based models are engaging with nouns and noun phrases change role in sequential sentences within short part of a text. They even have limitations in global coheren...

متن کامل

Scalable Graph-Based Learning Applied to Human Language Technology

Scalable Graph-Based Learning Applied to Human Language Technology Andrei Alexandrescu Chair of the Supervisory Committee: Associate Research Professor Katrin Kirchhoff Electrical Engineering Graph-based semi-supervised learning techniques have recently attracted increasing attention as a means to utilize unlabeled data in machine learning by placing data points in a similarity graph. However, ...

متن کامل

Information Extraction in Medical Domain Using Ontology and Knowledge Graphs

Medical documents contain lots of information which can be useful to build many health related applications. Since medical documents present unstructured information in nonstandard natural language so it is difficult to extract this information and present in a structured manner. We propose a model named "Feature Based Relation Extraction with Relational Learning using Medical Ontology" which m...

متن کامل

A General Framework for Information Processing with Application to Quantitative Software Testing

We propose in this paper a general framework for collecting and processing data from various sources of information. An advantage of our framework over traditional data mining frameworks is the ability to collect nuggets of information from sources encoded in different formats. We can handle documents or fragments of knowledge written in natural language (open structure documents), XML-like dat...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002